Loads

Libraries and functions

In [1]:
source("load_libraries.R")
Warning message in is.na(x[[i]]):
“is.na() applied to non-(list or vector) of type 'environment'”Warning message in rsqlite_fetch(res@ptr, n = n):
“Don't need to call dbFetch() for statements, only for queries”
==========================================================================
*
*  Package WGCNA 1.63 loaded.
*
*    Important note: It appears that your system supports multi-threading,
*    but it is not enabled within WGCNA in R. 
*    To allow multi-threading within WGCNA with all available cores, use 
*
*          allowWGCNAThreads()
*
*    within R. Use disableWGCNAThreads() to disable threading if necessary.
*    Alternatively, set the following environment variable on your system:
*
*          ALLOW_WGCNA_THREADS=<number_of_processors>
*
*    for example 
*
*          ALLOW_WGCNA_THREADS=4
*
*    To set the environment variable in linux bash shell, type 
*
*           export ALLOW_WGCNA_THREADS=4
*
*     before running R. Other operating systems or shells will
*     have a similar command to achieve the same aim.
*
==========================================================================


Allowing multi-threading with up to 4 threads.
[1] "preparing gene to GO mapping data..."
[1] "preparing IC data..."
[1] "preparing gene to GO mapping data..."
[1] "preparing IC data..."
[1] "preparing gene to GO mapping data..."
[1] "preparing IC data..."
In [2]:
source("functions.R")

Data

In [3]:
load("../results/dge/gene_length.RData")
load("../results/dge/filtered_metadata.RData")
load("../results/dge/dge.RData")
load("../results/dge/contrasts.RData")
load("../results/dge/filtered_norm_counts.RData")
load("../results/dge/filtered_z_scores.RData")
load("../results/dge/dge_net_pal2.RData")
load("../results/dge/col_order.RData")
load("../results/dge/annot_col.RData")
load("../results/dge/annot_colors.RData")
load("../results/dge/genes_in_modules.RData")
load("../results/dge/all_deg_genes.RData")
In [4]:
dir_path = "microbiota-effect/microbiota-sex-age/"

Extraction of the differentially expressed genes

Extract DEG between GF and SPF for the different ages and sex combinations

  • Threshold for adjusted p-value: 0.05
  • Threshold for adjusted significant fold change: 1.5

Table with the factors

In [5]:
sub_contrasts = contrasts %>%
    filter(Info == 'GF vs SPF (Female, Young)' | 
           Info == 'GF vs SPF (Male, Young)' | 
           Info == 'GF vs SPF (Female, Middle-aged)' |  
           Info == 'GF vs SPF (Male, Middle-aged)' |  
           Info == 'GF vs SPF (Female, Old)' |  
           Info == 'GF vs SPF (Male, Old)')
sub_contrasts
InfoInterceptMale vs FemaleGF vs SPFMiddle-aged vs YoungOld vs YoungMale & Middle-agedMale & OldMale & GFGF & Middle-agedGF & Old
GF vs SPF (Female, Young) 0 0 1 0 0 0 0 0 0 0
GF vs SPF (Male, Young) 0 0 1 0 0 0 0 1 0 0
GF vs SPF (Female, Middle-aged)0 0 1 0 0 0 0 0 1 0
GF vs SPF (Male, Middle-aged) 0 0 1 0 0 0 0 1 1 0
GF vs SPF (Female, Old) 0 0 1 0 0 0 0 0 0 1
GF vs SPF (Male, Old) 0 0 1 0 0 0 0 1 0 1
In [6]:
deg_results = lapply(sub_contrasts$Info, function(x) get_dge_results(x, dge, sub_contrasts))
names(deg_results) = sub_contrasts$Info

Extract the log2FC of the DEG

In [7]:
deg = extract_DEG_log2FC(deg_results, dir_path)
In [8]:
all_deg_genes = unique(c(all_deg_genes, deg$sign_fc_deg$genes))
save(all_deg_genes, file="../results/dge/all_deg_genes.RData")

Stats about the DEG

In [9]:
deg = extract_DEG_stats(deg, dir_path)
deg$stat
Using type as id variables
GF vs SPF (Female, Young)GF vs SPF (Male, Young)GF vs SPF (Female, Middle-aged)GF vs SPF (Male, Middle-aged)GF vs SPF (Female, Old)GF vs SPF (Male, Old)
All DEG (Wald padj < 0.05)256 127 877 426 19303329
All over-expressed genes (Wald padj < 0.05 & FC > 0)146 80 375 171 8661496
All under-expressed genes (Wald padj < 0.05 & FC < 0)110 47 502 255 10641833
DEG (Wald padj < 0.05 & abs(FC) >= 1.5)145 89 352 244 9991265
Over-expressed genes (Wald padj < 0.05 & FC >= 1.5) 92 58 136 84 311 324
Under-expressed genes (Wald padj < 0.05 & FC <= -1.5) 53 31 216 160 688 941

All DEG (Wald padj < 0.05)

In [10]:
plot_sign_DEG_upset(deg)

DEG (Wald padj < 0.05 & abs(FC) > 1.5)

In [11]:
pdf(paste('../results/dge/', dir_path, '/sign_FC_DEG_upset.pdf', sep=''))
plot_sign_FC_DEG_upset(deg)
dev.off()
plot_sign_FC_DEG_upset(deg)
pdf: 2

DEG with significant p-value and fold change

Log2FC

In [12]:
data = deg$sign_fc_deg %>% select(-genes)
fc_annot = data.frame(Comparison = rep("GF VS SPF",2),
                      Sex = rep(c("Female","Male"), 3),
                      Age = c(rep("Young", 2), rep("Middle-aged", 2), rep("Old", 2)),
                      row.names = colnames(data))
fc_annot
plot_fc_heatmap(data, fc_annot)
ComparisonSexAge
GF vs SPF (Female, Young)GF VS SPF Female Young
GF vs SPF (Male, Young)GF VS SPF Male Young
GF vs SPF (Female, Middle-aged)GF VS SPF Female Middle-aged
GF vs SPF (Male, Middle-aged)GF VS SPF Male Middle-aged
GF vs SPF (Female, Old)GF VS SPF Female Old
GF vs SPF (Male, Old)GF VS SPF Male Old

Z-score

In [13]:
comps = list(
    "GF vs SPF (Female, Young)" = c(grep("SPF_8w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("GF_8w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "GF vs SPF (Female, Middle-aged)" = c(grep("SPF_52w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("GF_52w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "GF vs SPF (Female, Old)" = c(grep("SPF_104w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("GF_104w_F_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "GF vs SPF (Male, Young)" = c(grep("SPF_8w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("GF_8w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "GF vs SPF (Male, Middle-aged)" = c(grep("SPF_52w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("GF_52w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE)),
    "GF vs SPF (Male, Old)" = c(grep("SPF_104w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE),
                  grep("GF_104w_M_+", colnames(norm_counts), perl=TRUE, value=TRUE))
)

Column order: sex - age - microbiote

In [14]:
plot_z_score_heatmap(z_scores,
                     deg$sign_fc_deg$genes,
                     col_order$sam,
                     annot_col$sam,
                     "All DE genes in GF vs SPF for any sex and age",
                     col_order$sam)
pdf(paste('../results/dge/', dir_path, '/z_score_sam.pdf', sep=''))
plot_z_score_heatmap(z_scores,
                     deg$sign_fc_deg$genes,
                     col_order$sam,
                     annot_col$sam,
                     "All DE genes in GF vs SPF for any sex and age",
                     col_order$sam)
dev.off()
pdf: 2
In [15]:
for(comp in names(comps)){
    plot_z_score_heatmap(z_scores,
        deg$sign_fc_deg %>% filter(!is.na((!!as.name(comp)))) %>% pull(genes),
        col_order$sam,
        annot_col$sam,
        paste("DE genes in", comp),
        comps[[comp]])
}

Column order: age - sex - microbiote

In [16]:
plot_z_score_heatmap(z_scores,
                     deg$sign_fc_deg$genes,
                     col_order$asm,
                     annot_col$asm,
                     "All DE genes in GF vs SPF for any sex and age",
                     col_order$asm)
pdf(paste('../results/dge/', dir_path, '/z_score_asm.pdf', sep=''))
plot_z_score_heatmap(z_scores,
                     deg$sign_fc_deg$genes,
                     col_order$asm,
                     annot_col$asm,
                     "All DE genes in GF vs SPF for any sex and age",
                     col_order$asm)
dev.off()
pdf: 2
In [17]:
for(comp in names(comps)){
    plot_z_score_heatmap(z_scores,
        deg$sign_fc_deg %>% filter(!is.na((!!as.name(comp)))) %>% pull(genes),
        col_order$asm,
        annot_col$asm,
        paste("DE genes in", comp),
        comps[[comp]])
}

Co-expression (WGCNA)

Z-score in modules

In [18]:
comps = names(comps)

Column order: sex - age - microbiote

In [19]:
plot_z_score_heatmap_with_modules(z_scores,
                                  rownames(z_scores),
                                  col_order$sam,
                                  annot_col$sam,
                                  genes_in_modules,
                                  "All genes")
In [20]:
for(comp in comps){
    plot_z_score_heatmap_with_modules(z_scores,
        deg$sign_fc_deg %>% filter(!is.na((!!as.name(comp)))) %>% pull(genes),
        col_order$sam,
        annot_col$sam,
        genes_in_modules,
        paste("DE genes in", comp))
}

Column order: age - sex - microbiote

In [21]:
plot_z_score_heatmap_with_modules(z_scores,
    rownames(z_scores),
    col_order$asm,
    annot_col$asm,
    genes_in_modules,
    "All genes")
In [22]:
for(comp in comps){
    plot_z_score_heatmap_with_modules(z_scores,
        deg$sign_fc_deg %>% filter(!is.na((!!as.name(comp)))) %>% pull(genes),
        col_order$asm,
        annot_col$asm,
        genes_in_modules,
        paste("DE genes in", comp))
}

Genes in modules

In [23]:
for(comp in comps){
    plot_top_deg_in_modules(deg$sign_fc_deg, comp, genes_in_modules)
}
options(repr.plot.width=7, repr.plot.height=7)

Enrichment analysis

In [24]:
deg = fit_proba_weighting_function(deg, gene_length)
Warning message in pcls(G):
“initial point very close to some inequality constraints”Warning message in pcls(G):
“initial point very close to some inequality constraints”Warning message in pcls(G):
“initial point very close to some inequality constraints”Warning message in pcls(G):
“initial point very close to some inequality constraints”Warning message in pcls(G):
“initial point very close to some inequality constraints”

GO analysis

In [25]:
deg = extract_GO_terms(deg, dir_path)
Warning message in stack.default(getgo(l$sign_fc_deg$genes, "mm10", "geneSymbol")):
“non-vector elements will be ignored”

Biological process

Dot-plot with the 10 most significant p-values for the different comparison

In [26]:
plot_top_go(deg, "BP", 10)
Using category as id variables
Using category, type as id variables
Warning message:
“Column `variable` joining factors with different levels, coercing to character vector”

Cellular components

Dot-plot with the most over-represented CC GO (20 most significant p-values for the different comparison)

In [27]:
plot_top_go(deg, "CC", 10)
Using category as id variables
Using category, type as id variables
Warning message:
“Column `variable` joining factors with different levels, coercing to character vector”

Molecular functions

Dot-plot with the most over-represented MF GO (20 most significant p-values for the different comparison)

In [28]:
plot_top_go(deg, "MF", 10)
Using category as id variables
Using category, type as id variables
Warning message:
“Column `variable` joining factors with different levels, coercing to character vector”

GO networks

The edge colors in the tree represent the relationship between two nodes. - green: positively regulates

  • red: negatively regulates
  • black: regulates
  • blue: is a
  • light blue: part of

GO Trees at "../results/dge/microbiote-effect/microbiote-sex-age/go/"

In [29]:
for(comp in comps){
    print(comp)
    fn = gsub(' ', '_', comp)
    fn = gsub('(\\(|\\)|,)', '', fn)
    fp = paste('../results/dge/', dir_path, 'go/', fn, '.png', sep='')
    col = get_GO_network_col_all_ont(deg$GO, comp)
    dotRes = getAmigoTree(goIDs=col$category,
                          color=col$color,
                          filename=fp,
                          picType="png",
                          saveResult=TRUE)
}
[1] "GF vs SPF (Female, Young)"
Warning message:
“Column `values` has different attributes on LHS and RHS of join”Warning message:
“Column `values` has different attributes on LHS and RHS of join”
[1] "GF vs SPF (Female, Middle-aged)"
Warning message:
“Column `values` has different attributes on LHS and RHS of join”Warning message:
“Column `values` has different attributes on LHS and RHS of join”
[1] "GF vs SPF (Female, Old)"
Warning message:
“Column `values` has different attributes on LHS and RHS of join”Warning message:
“Column `values` has different attributes on LHS and RHS of join”
[1] "GF vs SPF (Male, Young)"
Warning message:
“Column `values` has different attributes on LHS and RHS of join”Warning message:
“Column `values` has different attributes on LHS and RHS of join”
[1] "GF vs SPF (Male, Middle-aged)"
Warning message:
“Column `values` has different attributes on LHS and RHS of join”Warning message:
“Column `values` has different attributes on LHS and RHS of join”
[1] "GF vs SPF (Male, Old)"
Warning message:
“Column `values` has different attributes on LHS and RHS of join”Warning message:
“Column `values` has different attributes on LHS and RHS of join”

KEGG pathways

In [30]:
deg = extract_KEGG_pathways(deg, dir_path)
Warning message in stack.default(getgo(l$sign_fc_deg$genes, "mm10", "geneSymbol", :
“non-vector elements will be ignored”
In [31]:
plot_kegg_pathways(deg$KEGG$over$category,
                   deg$sign_fc_deg,
                   paste('../results/dge/', dir_path, '/kegg/over_repr_kegg/', sep=''))
Error in `$<-.data.frame`(`*tmp*`, labels, value = c("", "", "", "", "", : replacement has 253 rows, data has 346
Traceback:

1. plot_kegg_pathways(deg$KEGG$over$category, deg$sign_fc_deg, paste("../results/dge/", 
 .     dir_path, "/kegg/over_repr_kegg/", sep = ""))
2. suppressMessages(pathview(gene.data = fc_deg, pathway.id = cat, 
 .     species = "Mus musculus", gene.idtype = "Symbol"))
3. withCallingHandlers(expr, message = function(c) invokeRestart("muffleMessage"))
4. pathview(gene.data = fc_deg, pathway.id = cat, species = "Mus musculus", 
 .     gene.idtype = "Symbol")
5. `$<-`(`*tmp*`, labels, value = c("", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", ""))
6. `$<-.data.frame`(`*tmp*`, labels, value = c("", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
 . "", "", "", "", "", "", "", ""))
7. stop(sprintf(ngettext(N, "replacement has %d row, data has %d", 
 .     "replacement has %d rows, data has %d"), N, nrows), domain = NA)

Pathway graphs available at ../results/dge/type-effect/type_gender_age/over_repr_kegg/

In [ ]:
plot_kegg_pathways(deg$under_represented_KEGG[,"category"],
                   deg$fc_deg,
                   paste('../results/dge/', dir_path, '/kegg/under_repr_kegg/', sep=''))

Pathway graphs available at ../results/dge/type-effect/type_gender_age/under_repr_kegg/

Citations

In [ ]:
citation("pheatmap")
In [ ]:
citation("RamiGO")
In [ ]:
citation("pathview")